Phonetic transcriptions in the spoken dutch corpus: how to combine efficiency and good transcription quality

نویسندگان

  • Catia Cucchiarini
  • Diana Binnenpoorte
  • Simo M. A. Goddijn
چکیده

This paper reports on an experiment aimed at establishing how phonetic transcriptions for the large CGN corpus can be obtained most efficiently. This experiment explores the po­ tential of an automatically generated transcription (AGT) by comparing an AGT with a reference transcription (Tref) of the same material, to determine whether and how the AGT can be improved to make it more similar to Tref. The results indicate that the AGT can be optimized through pronunciation variation modelling so as to make human corrections more efficient or even superfluous, at least for some speech styles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to Improve Human and Machine Transcriptions of Spontaneous Speech

This paper reports on an experiment aimed at measuring the quality o f automatic and human phonetic transcriptions of different speech styles that were produced within the framework o f a large speech corpus project for Dutch, the Spoken Dutch Corpus (C orpus Gesproken Nederlands, CGN). The results indicate that the procedure adopted in the CGN to improve the quality o f phonetic transcriptions...

متن کامل

Title : Automatic Phonetic Transcription of Large Speech Corpora

Most large speech corpora are delivered with a lexicon that contains a canonical transcription of every word in the orthographic transcription. Such a lexicon can be used for generating a hypothetical ‘canonical’ phonetic transcription from the orthography. In addition, time and money permitting, some speech corpora are provided with a manually verified broad phonetic transcription of at least ...

متن کامل

Regional Bias in the Broad Phonetic Transcriptions of the Spoken Dutch Corpus

In this paper, we assess an aspect of the quality of the broad phonetic transcriptions in the Spoken Dutch Corpus (CGN). The corpus contains speech from native speakers of Dutch originating from The Netherlands and the Dutch speaking part of Belgium. The phonetic transcriptions were made by transcribers from both regions. In previous research, we have identified regional differences in the tran...

متن کامل

Automatic generation of phonetic transcriptions for large speech corpora

We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...

متن کامل

The Influence of the Labeller's Regional Background on Phonetic Transcriptions: Implications for the Evaluation of Spoken Language Resources

Phonetic transcriptions of spoken language corpora are not an exact written reproduction of the speech signal. They are influenced by a variety of factors such as the transcriber s native categorical perception. What remains unexplored is to what extent variation of perception within the same language exerts any influence on phonetic transcriptions. We report a case study of the labelling of vo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001